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MEMORY CONTROLLER FOR HANDLING MULTIPLE CLIENTS AND 

METHOD THEREOF 

CO-PENDING APPLICATIONS 

This invention is related to co-pending patent applications entitled "METHOD AND 
5 APPARATUS FOR STORING AND DISPLAYING VIDEO IMAGE DATA IN A VIDEO 
GRAPHICS SYSTEM", which has an application number 09/186,034 and the filing date of 
November 3, 1998 and "METHOD AND APPARATUS FOR MEMORY ACCESS 
SCHEDULING IN A VIDEO GRAPHICS SYSTEM", application number 09/224,692 and 
the filing date of January 4, 1999, and United States Patent Application Serial No. 
10 XX/XXX,XXX entitled "TILED MEMORY CONFIGURATION FOR MAPPING VIDEO 
DATA AND METHOD THEREOF" having Attorney Docket Number ATL0001 660 and filed 
on even date herewith. 

FIELD OF THE INVENTION 

The present invention relates generally to memory and more particularly to mapping data 
15 in memory. 

BACKGROUND OF THE INVENTION 

As modern society progresses into the digital revolution, multimedia has become more 
prevalent. Music is digitally encoded into compact discs (CDs) as well as digital files, such as 
20 with Motion Pictures Experts Group Layer 3 (MP3) format. Movie audio and video is encoded 
into digital video disks (DVDs). Television video and audio is now being recorded digitally in 
select areas. Analog television is being replaced by digitally encoded television, such as 
standard definition television (SDTV) and high definition television (HDTV). 

Individual digital media types must be decoded electronically. Each digital media type 
25 has its own decoding scheme. To access available media, individual decoders are used. To 
allow a single system to offer access to many of the digital media available, the individual media 
decoders are integrated into the system. Many of the media types require intensive processing 
and high bandwidth, such as with Motion Pictures Experts Group (MPEG) video decoding. A 
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system is heavily burdened when processing requests from multiple media decoders. The time 
for a system to process a single request for a single media decoder can substantially increase 
when multiple decoders are being used. The delay to process a request can exceed the media 
decoder's limits. When a system does not adequately handle a request in the time allotted from a 
5 media decoder, the media is not properly decoded and the system has failed in providing service 
for that media. 

Therefore, a system that overcomes these problems would be useful. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Various objects, advantages, features and characteristics of the present invention, as well 
10 as methods, operation and functions of related elements of structure, and the combination of 
parts and economies of manufacture, will become apparent upon consideration of the following 
description and claims with reference to the accompanying drawings, all of which form a part of 
this specification, wherein like reference numerals designate corresponding parts in the various 
figures, and wherein: 

15 FIG. 1 is a block diagram of a system for handling memory requests from a plurality of 

clients and routing them to multiple memory devices, according to an embodiment of the present 
invention; 

FIG. 2 is a flow chart illustrating how multiple client requests can be distributed among 
multiple memory devices, according to a specific embodiment of the present invention; 

20 FIG. 3 is a diagram illustrating a method of tiling memory, according to one embodiment 

of the present invention; 

FIG. 4 is a diagram illustrating how Luma ( Y) image data and Chroma (UV) image data, 
pertaining to a single portion of an image, are stored in subsequent tiles of memory, according to 
another embodiment of the present invention; 

25 FIG. 5 is a block diagram illustrating how Luma (Y) image data and Chroma (UV) image 

data is collected in separate memory planes; and 



2 



ATL0001670 



FIG. 6 is a block diagram illustrating how Luma (Y) image data and Chroma (UV) image 
data from separate image frames are collected and stored. 

DETAILED DESCRIPTION OF THE FIGURES 

At least one embodiment of the present invention provides for a method of mapping data, 
5 related to a media-decoding client, in memory. The method comprises storing data in a tiled- 
surface format. The tiled-surface format is a logical arrangement of tiles representing a plurality 
of pages within memory banks. The tiles are arranged such that pages of data for sequential 
retrieval are implemented in different memory banks. The method further comprises retrieving 
the data from memory. An advantage of at least one embodiment of the present invention is that 
1 0 requests from multiple MPEG decoders can be handled by a single device. Another advantage of 
at least one embodiment of the present invention is that multiple memories can be used to 
efficiently handle requests from a plurality of clients. 

Referring to FIG. 1, a system for routing requests from multiple clients to independent 
memory devices is shown, according to one embodiment of the present invention. Memory 

15 requests made by clients 105 are routed to one of two memory controllers, memory controller 
140 or memory controller 150. The memory controllers route the individual requests to their 
respective memory devices, dynamic random access memory (DRAM) 160 for memory 
controller 140 and double data rate dynamic random access memory (DDRDRAM) 170 for 
memory controller 150. In a specific embodiment of the present invention, memory blocks 1 60 

20 and 170 include the same memory types. 

Clients 1 05 represent multimedia components that require access to memory to run their 
processes. Clients 105 can include multiple clients of different types, such as first graphics 
display 112 and first audio decoder 1 1 5, as well as multiple clients of the same type, such as first 
Motion Pictures Experts Group (MPEG) decoder 1 1 0 and second MPEG decoder 111. In one 
25 embodiment, clients 105 include multiple decoders to handle video and audio, as part of an 
information handling system. First MPEG decoder 1 10 and second MPEG decoder 1 1 1 handle 
video decoding, such as for digital video disk (DVD) video data and digital television (DTV) 
data. In one embodiment, first MPEG decoder 110 decodes standard definition television 
(SDTV) video data and second MPEG decoder 1 1 1 decodes high definition television (HDTV) 
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video data. First graphics display 112 and second graphics display 113 decode graphics, such as 
text and general information handling system graphics. Graphical user interface (GUI) 114 
decodes information for display on a monitor. First audio decoder 1 1 5 and second audio decoder 
120 decode audio data from digital formats, such as MPEG layer-3 (MP3) audio or compact disc 
5 (CD) audio. In operation, each of clients 1 05 makes calls, or requests, to write information to or 
read information from particular memory addresses. It should be appreciated that other clients 
may also be included under clients 105, such as streaming video and audio decoders, without 
departing from the present scope of the invention. 

A router 130 receives requests for memory access generated by clients 105. Router 130 
1 0 controls which memory controller the request should be sent to, and can route the requests based 
on the memory address a client 105 is attempting to access. In one embodiment, each of the 
client requests is given a "tag". A tag is a client-specific identifier attached to each client 
request. The tag can provide information about the client that initiated the request as well as the 
purpose of the request. The client tag information can be used to differentiate among the client 
15 requests and determine to which memory controller the router 130 should route the request. 
Client tags can be used to ensure that the bandwidth needs of the client request are properly met 
by memory controllers 140 and 150. 

Different memory blocks can be used to access memory with higher bandwidth or a faster 
data rate. For example, in one embodiment, second MPEG decoder 1 1 1 is used for decoding 

20 high bandwidth HDTV video data. To meet these needs efficiently, DDRDRAM 170 can be 
used with a larger bus width than the bus width used by DRAM 160. Any clients 105 which 
need more bandwidth to run smoothly, such as second MPEG decoder 111, can have their 
requests directed to DDRDRAM 170, which operates at a higher data rate, through memory 
controller 1 50. As previously discussed, in a specific embodiment, memory devices 1 60 and 170 

25 represent the same type of memory, such as DRAM, or different types of memory as needed for 
specific system level implementations. 

The amount of memory available on DRAM 160 and DDRDRAM 170 may also be 
different. Router 130 can route client requests to the respective memory controllers, memory 
controller 140 and memory controller 150, to allow the requests to be received where space is 
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most readily available. Router 130 can also route requests from similar high bandwidth clients to 
different memory controllers to ensure that a single memory device is not overburdened with all 
the high bandwidth requests. For example, in one embodiment, first MPEG decoder 110 and 
second MPEG decoder 1 1 1 both decode HDTV data. To ensure that requests from both first 
5 MPEG decoder 110 and second MPEG decoder 1 1 1 are handled efficiently, requests from first 
MPEG decoder 1 10 are delivered to memory controller 140 while requests from second MPEG 
decoder 1 1 1 are delivered to memory controller 150, allowing the heavy processing workload to 
be divided among the available memory devices, DRAM 160 and DDRDRAM 170. Router 130 
is used to route the requests from different clients of clients 1 50, allowing the memory resources, 
1 0 DRAM 1 60 and DDRDRAM 1 70, to be used efficiently. In a specific implementation, the router 
130 receives and/or accesses one or more storage devices, such as a register (not shown), to 
determine the routing for a specific client. In this implementation, a routing scheme can be 
programmed for a specific system requirement, and may even be controlled dynamically. 

The routed requests are sent to one of at least two memory controllers, memory controller 
15 1 40 or memory controller 1 50. Each memory controller can be used to handle requests requiring 
varying bandwidth. In one embodiment, first MPEG decoder 1 10 decodes SDTV video data and 
second MPEG decoder 1 1 1 decodes HDTV video data. Memory controller 150 can be used, 
along with a high-bandwidth memory device, such as DDRDRAM 170, to handle the higher 
bandwidth requirement of the HDTV data from second MPEG decoder 111. Bus lines used with 
20 DDRDRAM 1 70 can be wider to accommodate a higher bandwidth. Memory controller 1 40 can 
be used to handle requests that are less intensive than those routed to memory controller 150. 
Memory controller 140 can be implemented to support general memory requests such as from 
GUI 1 14, first audio decoder 1 15 or second audio decoder 120. Requests with SDTV data, such 
as from first MPEG decoder 1 1 0, requiring less bandwidth than second MPEG decoder 1 1 1 , can 
25 be sent to memory decoder 140. In another embodiment, one memory controller may be 
dedicated to video and audio decoding while the other memory controller is dedicated to a 
central processing unit (CPU) which may be a system control processor running an operating 
system. 

Memory controllers 140 and 1 50 are preferably "transparent" to clients 105. Clients 105 
30 do not need to alter their data request to accommodate memory controllers 140 and 150. If the 
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data requests sent by clients 105 are of a different format than what memory controllers 140 and 
150 use, memory controllers 140 and 150 can alter the request format themselves, and convert 
any responses so they will be understood by clients 105. Requests to/from clients 105 are 
preferably formatted as if clients 105 were communicating directly with memory. 

5 Within memory controllers 140 and 150, the client requests are sent through an intra- 

client arbiter 145. Intra-client arbiter 145 determines which of the clients, among similar type 
clients, such as first MPEG decoder 110 and second MPEG decoder 111, should be directed to 
the memory device under the control of its memory controller, such as DRAM 160 under the 
control of memory controller 140 and DDRDRAM 170 under the control of memory controller 

10 150. Intra-client arbiter 145 can choose among clients of a similar type based on a priority. 
Using client tags, some client requests may be given priority over another. For example, first 
audio decoder 115 may be decoding MP3 audio data and may be considered a higher priority 
client than second audio decoder 120 which may be decoding CD audio data. The memory 
request from first audio decoder 115 would be processed before the request from second audio 

15 decoder 120. 

The choice of priority may be made due to the amount of data required or the time it 
would take to process the request. Priority can be issued to ensure that the request is processed 
within a certain amount of time required by the client, allowing the processing performed by 
clients 150 to be uninterrupted by the delay in processing a particular request. As different 
20 clients of a specific client type may have different latency requirements, different priorities can 
be assigned to the clients. Latency is the amount of time a given request takes to be processed. 
Client requests should be processed within the assigned latency limits of clients 105 to avoid the 
case in which a request surpasses a client latency limit, thereby preventing a client 105 from 
completing a process. 

25 Client's requests may be ordered into levels of priority, based on the importance of 

processing one request over another. For example, a graphics display request may be processed 
over a general video display request or an audio read/write request. Client requests may be given 
dynamic priority levels, allowing the priority level associated with a client to be changed as 
needed. An identifier within the request, such as the setting of a specific bit, could determine 
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which priority to use. An internal timer could also be used to apply varying levels of priority to a 
specific request at varying times. Alternatively, intra-client arbiter 145 may alternate among 
requests from clients of a certain type. Choosing requests by simply alternating is referred to as 
round robin arbitration. It will be appreciated that other types of arbitration may be performed in 
5 place of or in addition to the types discussed herein. 

After the requests among like clients are selected and ordered by intra-client arbiter 145, 
the client requests are delivered to inter-client arbiter 147. Inter-client arbiter 147 chooses 
among different clients to determine which of the client requests should be processed first. Inter- 
client arbiter 147 can assign priority to different clients, dependent on the type of client they 
1 0 originate from or the size of the data requests made by the client. For example, inter-arbiter 1 47 
may process requests made by first MPEG decoder 110 before processing requests made by first 
audio decoder 115. Alternatively, inter-client arbiter 147 may use round robin arbitration, or 
some other suitable arbitration scheme, to choose among client requests. 

The ordered requests from memory controllers 140 and 150 are delivered to respective 
15 memory devices, DRAM 160 or DDRDRAM 170. State machine 149 can be used to set the 
addresses to access in the memory devices. In one embodiment of the present invention, the 
commands from clients 105 have been ordered by intra-client arbiter 145 and inter-client arbiter 
147 to meet the bandwidth and latency requirements of clients 105. For example, in one 
embodiment, memory resets are given the highest priority, followed by first graphics display 
20 112. The priority list could also include motion compensation related memory requests, for first 
MPEG decoder 110, leaving all other client requests as having the lowest priority, using round 
robin arbitration to choose among them. 

In one embodiment of the present invention, the multiple memory controllers, such as 
memory controllers 140 and 150, are operated in tandem, acting like a single memory channel 
25 with a proportionally higher available memory bandwidth then each of individual memory 
controllers 140 and 150. All requests are routed to intra-client type arbiter 145 and inter-client 
type arbiter 147 of only one memory controller. The state machines 149,used for addressing 
DRAM 1 60 and DDRDRAM 1 70, from their respective memory controllers, memory controllers 
140 and 150, are operated in a synchronous manner. An advantage of this configuration is that 
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the memory bandwidth that can be delivered to a single client is the sum of all memory 
controllers. This configuration may be useful in applications where fewer, high-bandwidth 
clients are needed. The configuration can also be used to allow slower, less-expensive memories 
to achieve a higher memory bandwidth. The memory and controllers used are scalable to the 

5 types of clients and the processing requirements, and the size of memory used can be changed. 
Each of clients 105 may have different requirements of memory size. For example, first MPEG 
decoder 110 may need to write and read from much larger blocks of data to handle video 
processing than first audio decoder 115. To accommodate this efficiently, multiple memory 
devices can be used, and the sizes of the device can be changed. For example, a larger memory 

10 device can be associated with one memory controller to allow first MPEG decoder 110 to 
exclusively use the memory device with the larger size. 

The speed of the memory device and the bus width is also scalable. Using a memory 
device with a wider bus, such as a 64-bit bus compared to a 32-bit bus width, can be used to 
allow devices with larger memory requests to communicate with larger blocks of data at one 

1 5 time. The speed of the memory device can also be changed by using a different memory device. 
For example, DDRDRAM 170 uses internal phase-locked loops (PLLs) to double the clock 
speed and process data at twice the speed of a single data rate memory device, such as DRAM 
160. Memory controller 140 and 150 can be designed to meet the requirements of clients 105 
and the types of memory used and are scalable to efficiently interface with the type of memory 

20 device used. It should be appreciated that different types of memory can be used, such as static 
RAM (SRAM) and static dynamic RAM (SDRAM), without departing from the scope of the 
present invention. 

Referring now to FIG. 2, a method of routing multiple motion pictures experts group 
(MPEG) decoders and multiple audio decoders to two separate memory devices is shown, 
25 according to one embodiment of the present invention. Memory requests are sent by multiple 
clients and are routed to two different memory devices. The requests are ordered by differences 
among clients of similar types and by different client types before being sent to the memory 
device. 

In step 210, a first MPEG decoder sends a request for memory access. In one 
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embodiment, the MPEG data request is related to a high definition television (HDTV) video 
stream. In step 220, a second MPEG decoder also sends a request related to another HDTV 
video stream. In step 230, a first audio decoder sends a request to access memory. In step 240, a 
second audio decoder sends another request for memory access. All the client requests sent in 
5 steps 2 1 0-240 are intercepted by a router. 

In step 250, a router receives and directs the client requests to different memory 
controllers, such as memory controller 140 and memory controller 150 (FIG. 1), associated with 
two different memory devices. In one embodiment, the two memory devices are a DRAM and a 
double data rate DRAM (DDRDRAM). The DDRDRAM can process requests at twice the 
10 speed of the DRAM. Furthermore, in the discussed embodiment, the memory controller 
associated with the DDRDRAM will have twice the available bus width, to process larger 
amounts of data at a time than the DRAM. In a specific embodiment, the memory controllers are 
associated with memory devices of a similar type. 

At step 250, router identifies the client that initiated the request and the memory address 
15 the client requests, to determine which memory controller to route the request to. The requests 
from the MPEG decoders are both high bandwidth HDTV requests that are separated and sent to 
different memory devices. Dividing the requests from the MPEG decoders is done to keep the 
requests from over- taxing a single memory device, allowing them to be processed more quickly. 
Each request is given individual treatment from a different memory device, allowing them to be 
20 processed simultaneously. The memory request from the first MPEG decoder is routed to the 
DRAM and the request from the second MPEG decoder is routed to the DDRDRAM. 

The requests from steps 230 and 240 are also routed. In one embodiment, the requests 
from the audio decoders are both routed to the memory controller associated with the 
DDRDRAM. Since the DDRDRAM can operate at twice the rate of the DRAM and is 
25 associated with the wider bus, the memory can process more requests than the DRAM, in the 
same amount of time. Sending the requests from both audio decoders to the DDRDRAM allows 
the DRAM to be used exclusively for processing the requests from the first MPEG decoder, sent 
in step 210. 

Requests from the same client can also be routed to different memory controllers, 
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allowing the client to access memory address blocks spanning multiple memory modules. For 
some clients, it is important that read data be returned in the same order that it was requested. In 
one embodiment, an interlocking method is used to preserve the order that requests originating 
from the same client are serviced by the two memory controllers. In step 290, an interlock 
5 monitors the activity of each client. When a request arrives and is routed to one memory 
controller, the second memory controller is prohibited from accepting subsequent requests until 
the first memory controller has finished. Note that each memory controller represents a pipeline 
that may contain several requests in various stages of completion. 

In the specific example the arbitration performed by steps 260 and 262 is trivial because 
10 only a single client is being serviced. However, in one embodiment, the operation of the steps 
260 and 262 can be the same as steps 265 and 275, which are described in greater detail below. 
Any requests received from the MPEG decoder 210 are provided to the step 280. 

In step 280, the ordered requests from the first MPEG decoder are processed by the single 
data rate memory device or other memory device. If the requests were associated with a read 
1 5 request, the returned data is sent back to the MPEG decoder, through the memory controller and 
the router. Note, in other embodiments, the read and write data can be provided directly to the 
router from the memory devices. 

In step 265, arbitration between requests from clients of the same type occurs. For 
example, if two audio decoder clients have pending requests, the step 265 will provide only one 

20 of the requests to the next step. In the specific example, if the request from the second MPEG 
decoder and the two audio decoders are received as active requests at the step 265. The MPEG 
decoder request will be provided to the step 275 for further arbitration since it is the only MPEG 
decoder request. However, step 275 will provide only one of the two active audio decoder client 
request to the step 275 . The audio decoder requests can be compared at step 265 for priority. In 

25 one embodiment, the audio decoder requests can be prioritized based upon a round robin 
technique. 

In step 275, the requests having different types are serviced according to specific criteria. 
In one embodiment, the ordering of service can be dependent upon the specific client requesting 
the data. For example, the second MPEG decoder request can be given a higher priority than a 
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request generated by either of the audio decoders. Alternatively, the requests from the clients 
can be processed sequentially based upon a round-robin arbitration scheme. A priority based 
arbitration scheme can prioritize service to requests that are more critical to operation of a 
system, that require more memory, that have a greater or lesser associated latency, or any other 
5 suitable factor. In one embodiment, the types of requests provided by a single MPEG decoder 
include motion compensation read and write requests and HDTV video stream data read and 
write requests. The priority based ordering can place the order of requests as: HDTV stream data 
write; HDTV stream data read; motion compensation read; and motion compensation write. 
Alternatively, the requests can be ordered by simply alternating among the type of request being 
10 processed, as in round robin arbitration. 

In step 285, the requests are sent to the DDRDRAM memory device to be processed. As 
previously discussed, if the request is associated with a read, the read data is returned to the 
client, through the associated memory controller and the router. 

Referring now to FIG. 3, a method of representing MPEG data in memory into a tiled 
1 5 memory structure, by the client, is shown, according to one embodiment of the present invention 
and is generally referred to as tiled memory 300. MPEG data stored by MPEG video decoders is 
arranged to allow blocks of data to be accessed without switching pages within a bank. If 
switching pages within a bank of memory is avoided, latency problems due to the page switch 
can be avoided. 

20 MPEG decoders store image, or frame, information in groups of stored data referred to as 

"blocks". A block 380 of data can be used to represent a rectangular portion of a video image or 
frame. Block 380 can include information related to a set of picture elements (PEL) of a video 
frame. In one embodiment, block 380 represents 16X16 PELs in a video frame. To reduce the 
amount of data needed for representing the PELs, a transformation can be performed on the data 

25 gathered from the PELs. Accordingly, a transformation can be used on block 380 to recreate the 
image, such as through a discrete cosine transform (DCT) of the data in block 380. Sets of 
blocks may be used to represent a single frame of video data. Subsequent frames can be pieced 
together to generate video, such as is done with MPEG decoding. Storing frame information in 
memory, a block 380 of image data may span several areas of memory. 



11 



ATI.0001670 



Multiple memory components can be assigned as memory banks. The banks are assigned 
logical values in the form of logical addresses to represent the portions of the memory. Each 
bank can in turn be logically divided into different pages. Pages within a memory bank can 
represent different sets of memory addresses associated with a row of memory addresses. Page 
5 mode dynamic random access memory is a technique used to support faster data accessing, 
wherein a whole page within a bank is accessed at one time. When accessing an address in 
memory, all the data in the page of memory, in which the address lies, is accessed. The accessed 
page can be stored, such as in a cache, for faster processing. Page mode dynamic random access 
memory allows memory access within a bank page to be performed quickly; however, when an 
1 0 address outside of the page is requested, the new page, associated with the new address, must be 
scanned to switch pages within the bank. Switching pages results in an undesired latency when 
processing data associated with MPEG decoders. 

MPEG decoding clients can store/retrieve data to/from memory by tiling different pages 
from different banks in a configuration such as tiled memory configuration 300. Different pages 

15 from a first bank can be used as "tiles" in tiled memory 300. For example, page 340 (page 1 of 
bank A) is mapped to tile 310 in tiled memory 300. Similarly, page 360 (page 1 of bank B) is 
mapped to tile 311. Pages 350 and 370 are mapped to tiles 312 and 313, respectively. In one 
embodiment, a single tile is used to represent a single page. It should be noted that 2 A n tiles may 
be used to represent a page. For example, with n equal to 1 , two tiles would represent a page and 

20 a single tile would represent half a page. It should also be appreciated that a tile can be 
represented by a plurality of pages. For example, a single tile may represent two pages. 

In a specific application, each tile will store data associated with a block of pixels that are 
displayed consecutively adjacent. In addition, a lower right pixel, represented in one tile, for 
example tile 312, is displayed immediately adjacent to the lower left pixel represent in tile 313. 
25 Therefore, the tiled data is logically accessed using an X-Y coordinate system that is analogous 
to how pixels are used to represent images. 

When mapping adjacent tiles on tiled memory 300, the clients can avoid placing tiles 
associated with different pages within the same bank of memory beside each other. This avoids 
additional latency time associated with accessing data from a different page of the same memory 
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bank. When a client is storing or retrieving a block of data, such as block 380, the client can give 
the upper left address and the lower right address to indicate the block of memory associated 
with block 380. Note that block 380 can span multiple tiles in tiled memory 300. While 
scanning the multiple tiles within block 380, the memory can be forced to alternate access 
5 between different banks and pages. If any change in a page is forced within the same bank, 
latency will have to be incurred to access the new page. To avoid this, the tiles are arranged in 
tiled memory 300 so that no adjacent tiles will include the same bank of memory. 

If pages 345 and 365 are mapped to tiles 322 and 323, all of block 380 can be accessed 
without forcing a change in the page of a bank. While banks A-D and pages 1 and 2 are being 

10 accessed, the page within a bank is not switched over block 3 80. Pages 355 and 375 aremapped 
to tiles 320 and 321, respectively, to avoid page switching when accessing blocks over that 
portion of tiled memory 300. Similarly, the tiles of tiled memory 300 are all arranged so that at 
any given vertex, where four tiles meet, all adjacent tiles indicate different banks of memory. 
Rows are completed to form the rest of tiled memory 300 in the same fashion. In one 

15 embodiment, where there are m pages in banks A-D, pages 347, 367, 357 and 377 are mapped to 
tiles 330, 331, 332 and 333, respectively. It should be noted that additional tiles may be placed 
across the rows of tiled memory 300, as necessary. Furthermore, more than four banks can be 
used to implement a configuration similar to tiled memory 300. The size of the block being 
accessed, such as block 380, can also be varied so long as it does not force a change in page 

20 within similar banks in tiled memory 300. For example, a block having an X or Y dimension 
greater than the X or Y dimension of a tile would generally cause a page change within a bank. 

In implementing a tiled memory configuration, such as tiled memory 300, an equation 
can be used to calculate the tiled surface address, when given an (x,y) coordinate. Using given 
"x" and "y" coordinates in bytes, along with information about the values of "offset" and 
25 "PITCH", an intermediate tiled address, "AITILE", can be calculated. "Offset" refers to the 
starting byte offset to the base address of the tiled surface. "PITCH" refers to the pitch, or width 
of the tiled memory surface. "Tilejieight" refers to the height of a tile in bytes; while 
"tile_width" refers to the width of the tile, in bytes. "AITILE" can be calculated using the 
following equation: 
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AITILE = Offset + {(y div tile_height) * tilejieight * PITCH} +{(x div tile__width) * 
tile_width * tilejieight} +{(y mod tilejieight) * tile_width} 4- (x mod tile_width). 

The tiled surface address can be taken from "AITILE", "AITILE" provides an 
intermediate address for determining the absolute address, referred to as "ATILE", of a tile. The 
memory spanned by a tile, using a configuration such as tiled memory configuration 300, 
includes different memory banks. To determine "ATILE" a bank bit may need to be switched in 
value in its location in "AITILE". Using the size of the tile as "tile_size" the location of the 
bank bit, "bankjrit" is determined according to the following equation: 

Bank J>it = log 2 tilesize. 

In at least one embodiment, "ATILE" is determined by altering the value of the bit 
determined by "bank_bit" according to the following equations: 

ATILE = AITILE; 

ATILE (bank J>it) - ATILE (bank Jut) XOR [(y (log2 (tilejieight)) and not pitch (log2 
(tilejvidth) + 1)]. 

The clients requesting the data can map memory into a tiled configuration, such as tiled 
memory 300. When the client attempts to communicate with memory, a memory controller, 
such as memory controller 140 (FIG. 1), can intercept the memory request and convert it to a 
linear format for processing. Alternatively, a memory controller can be used to format linear 
memory operations into a tiled memory configuration. The use of tiled memory 300 can allow 
improved memory accessing and processing times over linear memory configurations. Such an 
improvement can be used to allow for the use of multiple MPEG decoders to access memory in a 
timely fashion. 

Referring now to FIG. 4, a method of storing different types of image data in different 
memory tiles is shown. A single video image, such as image 41 0 ? is composed of a plurality of 
PELs. The PELs may contain data associated with different color components. In one 
embodiment, the PELs include Luma (Y) data 415, associated with the image intensity, and 
Chroma (UV) data 417, associated with the image color, where UV data 4 1 7 can be composed of 
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separate color channels. Note during video encoding, the Y data 415 and the UV data 417 are 
separated into different data types to store a representation of first image 410. For example, the 
block labeled Yl in plane 415 represents the Y component of a first portion of the image 410, 
while the block labeled Y2 in plane 415 represents the Y component of a second portion of the 
5 image 410 that is adjacent to the first portion. The blocks Ul VI and U2V2 represent the UV 
components of the first and second video blocks. 

As previously discussed, a single block of image information represents a set of PELs 
that form a portion of an image frame. To reduce the amount of data stored for each block, the 
blocks are sub-sampled. Only the Luma and Chroma data associated with some of the PELs are 
10 stored in the block. The human eye has been considered to be more sensitive to changes in 
luminance image information than to chrominance image information. Accordingly, for image 
data compression, less Chroma data is collected than Luma data within a block. In one 
embodiment, a single block represents 1 6X1 6 PELs and only includes four Luma data values and 
two Chroma values. 

15 Blocks of image 410 may be divided into Y data portions and UV data portions. The Y 

data of the image 410 is stored in a first plane, Luma data plane 415, while the UV data is stored 
in a second plane, Chroma data plane 417. For example, a Yl portion of image 410 may 
correspond to the same area of image 410 as an associated U1V1 portion of image 410. An 
offset of image 410 associated with the Yl portion may correspond to an offset of image 410 

20 associated with the Ul VI portion. It should be noted that the offsets may be approximately the 
same, as the Yl portion and the Ul VI portion indicate the same block ofimage 410; however, as 
Y and UV data vectors are slightly different, the offsets may also be slightly different. Note the 
UV data, as shown in FIG. 4, has been compressed, or sub-sampled, so that both the U 
component and the V component corresponding to a tile location can be stored together in one 

25 tile, or part of one tile. In other embodiments, the UV data would not be compressed and each 
component would be stored in its own plane. In another embodiment, where the UV data is 
compressed by at least four times, the UV data of images associated with two tiles can be saved 
in a single tile. For example, referring to FIG. 5, compressed UV data can be stored in tile C5 of 
plane 417 that corresponds to the Y data stored in tiles Al and A3 of plane 415. Note if tile C5 

30 of plane 417 stored data corresponding to tiles Al and C2, a latency conflict would occur when 
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both C5 and C2 are being accessed. 

To decode the image associated with portion Yl of image 410, the Y data of tile Yl 430 
is combined by overlaying the Y data with the UV data of tile Ul VI 440, by a decoder, to render 
an output image, as discussed further in FIG. 6. In one embodiment, when the image portion 
5 being represented by tiles Yl 430 and U1V1 440 is being rendered, the entire image portion 
stored in Luma data plane 415 is read before reading the image portion in Chroma plane 417. 
Once all the data related to the image portion is read, the image portion is rendered. In another 
embodiment, the image data is accessed on a word-by-word basis alternating between the Luma 
data plane 415 and the Chroma data plane 417. For example, for each word accessed from tile 
10 Yl 430 a corresponding word from tile Ul VI 440 is accessed. In this manner, the data from 
tiles Yl 430 and U1V1 440 can be provided to a decoder for generation of the final image. 
Similarly, data from tile Y2 435 and tile U2V2 445 is provided to the decoder to render another 
portion of image 410. 

To avoid additional latency due to page faults when accessing multiple planes of data, 
15 corresponding tile locations between planes need to be in different banks. Figure 5 illustrates 
non-overlapping planar memory partitions. Specifically, if block 5 1 0 represents an image block 
being accessed, the tiles in planes 415 and 41 7 are partitioned among the memory banks to avoid 
additional latency. 

For example, when a top left most pixel of block 5 10 is being accessed, banks Al Al and 
20 C5 C5 will be accessed. In one embodiment, the Luma data for block 510 is accessed from 
Luma plane 415, followed by the Chroma data being accessed from Chroma plane 417. Specific 
methods may be employed for improving memory access efficiency when reading image data 
from memory. For example, if the pixels are read by rows from left to right across block 5 1 0, no 
page conflicts occur. Likewise, as rows of pixels are read from top to bottom, in Luma plane 
25 4 1 5 , page conflicts are reduced due to the non-overlapping bank structure indicated. However, if 
the Chroma data of block 5 10 is read from top to bottom, in Chroma plane 417, after the Luma 
data in C2 Al and D2 C2 is read from Luma plane 415, page conflicts will occur as C2 Al and 
D2 C2 in Luma plane 415, share banks with C5 C5 and D5A6, respectively, in Chroma plane 
417. In one embodiment, to avoid page conflicts when reading image data, blocks read in 
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Chroma plane 417 are read in the opposite direction as blocks read in the Luma plane 415. For 
example, if block 510 is read from top to bottom in Luma plane 415, from tile Al to tile C2, 
block 510 is read from bottom to top in Chroma plane 517, from tile A6 to tile C5. 

It should be noted that a block stored in planes 41 5 and 417, such as block 5 1 0, may be 
5 mostly contained in a single tile. For example, in Luma plane 415, block 5 1 0 may be contained 
mostly in tile A1A1, with a minimal portion of block 510 overlapping into tile C2C2. 
Depending on the amount of block 5 1 0 stored in tile C2 of Luma plane 41 5, all of block 5 1 0 may 
be contained in tile C5, in Chroma plane 417, with none of block 510 stored in tile A6. If the 
Luma data read from tile Al to C2 in Luma plane 41 5 is followed by reading tile C5 of Chroma 

10 plane 41 7, a page conflict will occur. Therefore, whenever a minimal portion of a block has been 
stored in a tile row in Luma plane 415, below a set threshold, the block is read in the direction 
from the row with the minimal portion of the block to the row with the larger portion stored. 
Accordingly, reading data in Chroma plane 417 is performed in the opposite direction. For 
example, if most of block 5 1 0 has been stored in tile Al in Luma plane 415, block 5 1 0 is read in 

15 Luma plane 415 from bottom to top, from tile C2 to tile Al . Corresponding Chroma data stored 
in Chroma plane 417 is read from top to bottom. It will be appreciated that other methods may 
be employed for improving memory access efficiency and the selection of one specific method 
over the other can be made without departing from the scope of the present invention. In one 
embodiment, an offset used in storing block 510 in Luma plane 415 may correspond to and be 

20 approximately the same as an offset used in storing block 5 1 0 in Chroma plane 417. 

In another embodiment of this invention, the page conflict between consecutive block 
accesses is also eliminated. If, for example, two identical blocks, 510 of FIG. 5 are being 
requested in series, and the access order was C2 to Al on Luma plane 415 followed by C5 on 
Chroma plane 417 as described above, then the resulting access pattern is C2, Al, C5. If a 

25 second identical block is accessed, a page conflict will occur in going from C5 for the Chroma 
access of the first block to C2 for the Luma access of the second block. If the portion of the 
Chroma block 5 1 0 in Chroma plane 41 7 is wholly contained in page C5, then the portion of the 
Luma block 5 1 0 in Luma plane 4 1 5 that lies in page C2 will be minimal The Luma block 5 1 0 
in the Luma plane 415 can be fetched by accessing a few rows from Al , followed by the minimal 

30 portion from C2, followed by the remaining portion of Luma block 510 from AL Then the 
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Chroma block 510 is fetched from C5. The resulting access pattern is Al, C2, Al , C5, and has 
no internal page faults, and may be followed by an identical block 510 access without 
encountering a page fault between C5 of the first block and Al of the second block. 

In another embodiment, the page conflict between consecutive block accesses is further 
5 eliminated by considering the departure bank of the preceding memory access. In the previous 
example, the Al, C2, Al , C5 access pattern was used to hide the single row access to C2 and at 
the same time, allow the entry and departure banks to be different. It should be appreciated that 
the same block 510 could also be fetched without page conflicts by reversing the access pattern 
by fetching the Chroma block 510 in Chroma plane 417 before the Luma block 510 in Luma 
10 plane 417 with the access pattern of C5, Al, C2, Al . This symmetry can be used to advantage if 
the previous memory access happened to depart from Bank A instead of Bank B, C, or D. 

Referring now to FIG. 6, Luma and Chroma data associated with different video frames 
are collected among three frame buffers, according to at least one embodiment of the present 
invention. Luma data, received through frame buffers 0-2, is stored in a collection of Luma 
15 planes 660 while Chroma data, received through frame buffers 0-2, is stored in a collection of 
Chroma planes 680. 

Portions of a single image frame may be stored as separate blocks of data, wherein the 
data includes Luma (Y) data and Chroma (UV) data. As previously discussed, compression of 
the data within a single block may be performed using data transformation, such as through the 
20 discrete cosine transform (DCT), or by sub-sampling the Y and UV data. Accordingly, further 
compression may be performed among sets of consecutive frames, or video. A single frame may 
be compressed through intra-frame coding, according to differences among blocks of picture 
elements (PEL) within a single frame. 

Since corresponding blocks in subsequent frames may not change, inter- frame coding 
25 may also be performed, wherein differences among blocks of sequential frames are used in data 
compression. A method of temporal prediction may be used for inter-frame coding of MPEG 
video data. Initially, a set of blocks corresponding to a first frame of data is transmitted. The 
data may include intra-frame coding among the blocks and the frame is generally referred to as 
an I-frame. Once the I-frame information has been sent, a frame with prediction data, referred to 
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as a P-frame, may be transmitted. The P-frame data includes prediction vectors for the blocks in 
the preceding frame, such as motion vectors relating to blocks within a previously sent I-frame. 

In order to allow a decoder to appropriately decode a video from nearly any point in the 
video transmission, multiple I-frames must be given to provide a reference. Additionally, bi- 
5 directional, or B-frames, may be transmitted to provide prediction information for previous and 
upcoming frames, allowing prediction information to be gathered when playing stored video data 
in reverse. B-frames may also be sent intra-coded, without any prediction information. It should 
be noted that B-frames are never used as references for other frames using temporal prediction. 

Compressed video information is inherently variable in nature. This is caused by the 
10 variable content of successive video frames. To store video at a constant bit rate it is therefore 
necessary to buffer the variable bitstream generated in the encoder in frame buffers. Incoming 
video data can be stored in frame buffers 0-2 and stored in memory, Y planes 660 and UV planes 
680, at a constant rate. Each buffer 0-2 can be used to represent a sequential frame of video data. 

In one embodiment, while Y data is stored in Y plane 660 in blocks of tiles representing 
15 different buffers, the UV data from buffers 0 and 1 is interweaved. The UV data is stored in 
alternating locations within tiles. In UV planes 680, UV data associated with buffers 0 and 1 are 
stored in alternating locations within tiles. The interweaving of UV data may allow simplified 
access among frames of prediction data by allowing an offset within a Y block to be used. For 
example, in one embodiment, buffer 0 relates to I-frame data, buffer 1 relates to P-frame data, 
20 and buffer 2 relates to B-frame data. When inter-coded prediction data is being decoded, 
information in a P-frame must be referenced to a corresponding I-frame to properly decode the 
frame associated with the P-frame. Accordingly, the interweaved structure of UV planes 680 
allows the I-frame data in plane 682 to be easily referenced when decoding the P-frame data in 
plane 684 by using approximately the same offset value. 

25 B-frame UV data, associated with buffer 2, is not used for reference by I- or P-frames. 

Therefore, it is not necessarily advantageous to interweave the B-frame data with the I- or P- 
frame data. Accordingly the data from buffer 2 is simply stored in sequential planes. In 
addition, the Y data from buffers 0-2 is also simply stored in blocks of planes associated with the 
different buffers. In one embodiment, the Y data associated with a single frame is twice as large 
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as the UV data and interweaving the Y data offers little to no advantage. 

It will be appreciated that other frame types may also be sent. For example, in one 
embodiment, frame data is encoded into D-frames, or display frames, which are stored in a 
separate frame buffer. Accordingly, it should also be appreciated that additional frame buffers 
5 may be used. In on embodiment, four frame buffers are used and a fourth buffer, buffer 3 (not 
shown) is introduced. The UV data associated with buffers 2 and 3 are interleaved in the 
manner illustrated for buffers 0 and 1. 

The various components present in the present application may be implemented using an 
information handling machine such as a data processor, or a plurality of processing devices. 
10 Such a data processors may be a microprocessor, microcontroller, microcomputer, digital signal 
processor, state machine, logic circuitry, and/or any device that manipulates digital information 
based on operational instruction, or in a predefined manner. Generally, the various functions, 
and systems represented by block diagrams are readily implemented by one of ordinary skill in 
the art using one or more of the implementation techniques listed above. 

15 When a data processor for issuing instructions is used, the instruction may be stored in 

memory. Such a memory may be a single memory device or a plurality of memory devices. 
Such a memory device may be read-only memory device, random access memory device, 
magnetic tape memory, floppy disk memory, hard drive memory, external tape, and/or any 
device that stores digital information. Note that when the data processor implements one or 

20 more of its functions via a state machine or logic circuitry, the memory storing the corresponding 
instructions may be embedded within the circuitry comprising of a state machine and/or logic 
circuitry, or it may be unnecessary because the function is performed using combinational logic. 

Such an information handling machine may be a system, or part of a system, such as a 
computer, a personal digital assistant (PDA), a hand held computing device, a cable set-top box, 
25 an Internet capable device, such as a cellular phone, and the like. 

It will be appreciated that in other embodiments, the memory controllers can intercept 
memory requests along a peripheral component interconnect (PCI) bus. The memory used to 
implement the invention may be altered from the types discussed, static RAM (SRAM), single 
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data rate RAM, and other forms of memory may be used. Furthermore, other types of client 
requests and other types of clients can be incorporated, without departing from the scope of the 
present invention. The memory controllers can be used to perform the method of tiling memory 
described herein. Furthermore, the method of tiling memory may be performed by a software 
driver, such as an application peripheral interface (API). It should now be appreciated by those 
skilled in the art that the present invention has the advantage that requests from multiple motion 
pictures experts group (MPEG) video decoders can be handled and ordered for improved 
efficiency in communicating with memory. 

In the preceding detailed description of the preferred embodiments, reference has been 
made to the accompanying drawings which form a part thereof, and in which is shown by way of 
illustration specific preferred embodiments in which the invention may be practiced. These 
embodiments are described in sufficient detail to enable those skilled in the art to practice the 
invention, and it is to be understood that other embodiments may be utilized and that logical, 
mechanical, chemical and electrical changes may be made without departing from the spirit or 
scope of the invention. To avoid detail not necessary to enable those skilled in the art to practice 
the invention, the description may omit certain information known to those skilled in the art. 
Furthermore, many other varied embodiments that incorporate the teachings of the invention 
may be easily constructed by those skilled in the art. Accordingly, the present invention is not 
intended to be limited to the specific form set forth herein, but on the contrary, it is intended to 
cover such alternatives, modifications, and equivalents, as can be reasonably included within the 
spirit and scope of the invention. The preceding detailed description is, therefore, not to be taken 
in a limiting sense, and the scope of the present invention is defined only by the appended 
claims. 
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